ৱিকিপিডিয়া:ৰাইজৰ চ'ৰা (কাৰিকৰী)

অসমীয়া ৱিকিপিডিয়াৰ পৰা
(ৱিকিপিডিয়া:VPTৰ পৰা পুনঃনিৰ্দেশিত)
Jump to navigation Jump to search
কেন্দ্ৰীয় আলোচনা
প্ৰস্তাৱসমূহ আলোচনা পৌনঃপুনিক প্ৰস্তাৱসমূহ

টোকা: নিষ্ক্ৰিয় আলোচনাসমূহ, শেষ হওক বা নহওক, আৰ্কাইভ ক’ৰা হ’ব
আৰ্কাইভআলোচনাসম্পাদনাইতিহাসপ্ৰদৰ্শন



সংৰক্ষিত পাঠ

সংৰক্ষিত পাঠ ১ সংৰক্ষিত পাঠ ২

Proposal to enable an experimental open source machine translation system for Assamese[সম্পাদনা কৰক]

Content translation has been used already by many communities to translate more than half a million Wikipedia articles, and we want to help communities with potential to grow by using translation more. We propose to enable in Content Translation experimental support for a basic machine translation system for Assamese that could improve over time with the help of the community.

Background[সম্পাদনা কৰক]

For some languages, Content translation offers Machine Translation as a starting point for users to edit and improve. In addition, the tool provides mechanisms to encourage the creation of good quality content, preventing the publication of lightly edited machine translations. As a result, the translations produced with the tool are less likely to be deleted than the articles started from scratch.

Unfortunately, Assamese is not supported by the translation services integrated in the tool such as Apertium, Google or Yandex. This requires editors to write their translations from scratch.

Proposal[সম্পাদনা কৰক]

We want to propose the integration of OpusMT, an Open source Neural Machine Translation system that supports the English-Assamese language pair (you can try the translation engine in this example website). Although the translation service is based on MarianMT, a powerful neural network system, the quality of the final translations heavily depends on the availability of translation data which is limited for Assamese at the moment. This means that:

  • The translation quality will be very low initially. When using the translation service you should expect to still rewrite most of the content. This may not be an extra effort compared to the current situation, given that users have to write their translations from scratch since there is no alternative Machine Translation system available.
  • Quality will improve over time as more articles are translated. Each translation created with Content Translation (using machine translation or from scratch) will contribute to the global corpus of translation examples used by the system. Thus, the more translations that are created with the tool, the better the machine translation service will become.

We believe that this approach will help the less-supported languages over time. We want to check if this plan sounds good with the Assamese community, and encourage Assamese editors to translate Wikipedia articles with the tool. In this way, your translations not only help expand the knowledge available in your language, but also become a way to improve the open source machine translation service that can be used inside and outside Wikipedia.

Note that the usage of OpusMT machine translation is optional and editors are free to translate without using such services. In order to facilitate access to the tool we also plan to enable Content translation by default on Assamese Wikipedia. That will make it easy for users to discover the tool through several entry points. However, users not interested in translation will still be able to disable it from their preferences.

Impact[সম্পাদনা কৰক]

On Assamese Wikipedia, 198 articles have been created with Content Translation since 2015 (compared to the 6.7 thousand articles created from scratch in the same period). So we don’t expect a high volume of new content as an immediate result of this experiment. In any case, we want to encourage editors to report any issues they find with the tool while creating or reviewing translations.

In order to make sure that the content created using the new machine translation system results in a productive contribution, we’ll keep track of the translations created with the tool and their deletion ratios. In addition, we’ll set stricter limits on machine translation to enforce more intensive editing of the initial low quality machine translation.

We want to make sure the expectations are clear before enabling the new system. Please share your questions and your thoughts on the idea. We’ll proceed only if there are no major concerns from the community.

Thanks! --Pginer-WMF (talk) 08:51, 17 January 2020 (ইউ.টি.চি.)

Hello everyone. We have enabled OpusMT for Assamese in Content translation. As we mentioned in last month announcement, this is an experimental translation service that will provide low quality translations initially but will improve over time as users translate new articles and make corrections.
We have adjusted the translation limits to make sure that articles are not published with more than 70% of unmodified machine translation. We’ll be monitoring the statistics for Assamese as well as the list of articles created with the tool. Please, share your experience creating and reviewing articles created with the tool, and let us know how we can support you better.
Thanks! --Pginer-WMF (talk) 10:15, 19 February 2020 (ইউ.টি.চি.)
Gnome-edit-redo.svgPginer-WMF: Thank you for you effort. I tried using the Content Translation, but unfortunately the translated text is not even close to the original text. It just gives random text (surprising I mostly saw texts from the Bible). I understand that we need to use this application more and more to train the machine translation engine and eventually get satisfactory result. However looking at the current status of the application and talking to our community members, many are discouraged to use it as it doesn't even translate one word correctly. I wonder what corpus was used to train the engines? Was there any native Assamese speaker involved in the process? Could you give us some details on this? --SlowPhoton (talk) 06:49, 20 February 2020 (ইউ.টি.চি.)
Gnome-edit-redo.svgSlowPhoton: Thanks for the feedback. We are using models from the Opus (Open Parallel Corpus) project from the University of Helsinki. They compile and process fee licensed content available on the web. One of those sources is the translations created with Content Translation itself (about 200 articles for Assamese), but the Opus project includes many other sources (I checked and the bible seems another one). I don't know how many of those resources include Assamese content, how much information they include and how reliable it is, but since it is base don open sources there may be ways to help improve it. The different ways that I can think for the Assamese community to help improve the corpus are: (a) review the current corpus to find mistakes in the translations and share with the project Opus maintainer Jorg Tiedemann (jorg.tiedemann@helsinki.fi), (b) share with Jorg any additional open corpus for Assamese that may be available on the internet for him to include, (c) contribute to any of the projects already integrated in Opus such as Content Translation (or Ubuntu/Gnome localizations, etc.) to help expand the translations available for Assamese. Native Assamese knowledge is needed for many of the activities that could help improve support, but if there is anything we could do to support the community in this, please let us know. Thanks! --Pginer-WMF (talk) 09:55, 21 February 2020 (ইউ.টি.চি.)

Editing news 2020 #1 – Discussion tools[সম্পাদনা কৰক]

19:24, 8 April 2020 (ইউ.টি.চি.)

Text Cleaner bot[সম্পাদনা কৰক]

অনুগ্ৰহ কৰি এই অনুচ্ছেদত একো সম্পাদনা নকৰিব। আলোচনা সমাপ্ত হৈছে।

শ্ৰদ্ধাৰ ৱিকিমিডিয়ানসকল, অসমীয়া লিখোঁতে সচৰাচৰ হোৱা কেইটামান সাধাৰণ ভুল ঠিক কৰাৰ বাবে অসমীয়া ৱিকিপিডিয়াত স্থানীয়ভাবে পৰিচালনা কৰা বটৰ প্ৰয়োজনীয়তাৰ প্ৰতি লক্ষ্য ৰাখি TextCleanerBot নামৰ এক অৰ্ধ-স্বয়ংক্ৰিয় বটৰ নিৰ্মাণ কৰি উলিয়াইছোঁ৷ পূৰ্বতে অসমীয়া ৱিকিপিডিয়াত Ucodebot নামৰ এটা বট আছিল, যি বঙালী ৰ-ক অসমীয়া ৰ-লৈ সলনি কৰিছিল৷ বৰ্তমানে ই নিষ্ক্ৰিয় হৈ আছে৷ TextCleanerBot বটটোৱে বঙালী ৰ-ক অসমীয়া ৰ-লৈ সলনি কৰাৰ লগতে আন কিছু যতি চিহ্ন, ণত্ব বিধি, ষত্ব বিধি আদিৰ ভুল শুদ্ধ কৰে৷ ইয়াব বাবে মৃদুল কুমাৰ শৰ্মা ডাঙৰীয়াৰ বৰ্ণাশুদ্ধি নিবাৰক লাচিত পেড-ৰ ডাটাবেছ ব্যৱহাৰ কৰা হৈছে৷ বৰ্তমানে পৰীক্ষামূলক ভাবে ইয়াক চলাই থকা হৈছে আৰু ইয়াৰ ফলাফল যথেষ্ট আশানুৰূপ দেখা গৈছে৷ অনাগত দিনত ইয়াক অধিক উন্নত কৰা হ'ব৷ স্থানীয়ভাবে বটটো পৰিচলনা কৰিলে আমি ভবিষ্যতে প্ৰয়োজন অনুসৰি ডাটাবেছ আপডেট কৰি থাকিব পাৰিম৷ বটটো স্থায়ীভাবে অসমীয়া ৱিকিপিডিয়াত প্ৰতিষ্ঠা কৰিবলৈ আপোনালোকৰ সমৰ্থনৰ প্ৰয়োজন৷ অনুগ্ৰহ কৰি তলত আপোনালোকৰ মতামত জনাই চহী কৰক৷ বটটোত কিবা ভুল দেখিলে তাকো তলত জনাওক৷

Dear Wikimedians, as we have seen in many articles in Assamese Wikipedia, some trivial mistakes and typos are present all over. Considering that I developed a semi-automated bot called TextCleanerBot to correct some usual mistakes such as replacing Bengali র with Assamese ৰ, punctuation related mistakes, and some other Assamese grammatical errors. I am thankful to Mridul Kumar Sharmah for letting me use his database for Lachit pad, an online sanitizer for Assamese texts. Earlier there used to be a bot called Ucodebot, which used to replace Bengali র with Assamese ৰ, but unfortunately, that is not active anymore. The idea behind having a locally run bot is that we as a community can keep updating the database as needed. I have already done some test runs and results are mostly positive with some fluke here and there. I will constantly be improving them as we go. Please provide your feedback whether you support or oppose such bot in Assamese Wikipedia. If you see any errors made by the bot, report them as well. --SlowPhoton (talk) 07:41, 14 May 2020 (ইউ.টি.চি.)

সমৰ্থন কৰোঁ / Support[সম্পাদনা কৰক]

  1. চাণক্য (বাৰ্তা) 07:40, 14 May 2020 (ইউ.টি.চি.)
  2. - অজয়  আহক আলোচনা কৰোঁ 08:23, 14 May 2020 (ইউ.টি.চি.)
  3. দিব্য দত্ত (বাৰ্তা) 11:34, 14 May 2020 (ইউ.টি.চি.)
  4. --Chiring chandan (talk) 14:10, 14 May 202 (ইউ.টি.চি.)
  5. — গীতাৰ্থ বৰদলৈ (talk) 12:23, 20 May 2020 (ইউ.টি.চি.)
  6. — মৃদুল কুমাৰ শৰ্মা (talk) 01:53, 21 May 2020 (ইউ.টি.চি.)
  7. Nayan j Nath (talk) 18:14, 1 June 2020 (ইউ.টি.চি.)
  8. ড০ ৰবীন জামান (talk) 18:22, 1 June 2020 (ইউ.টি.চি.)

সমৰ্থন নকৰোঁ / Oppose[সম্পাদনা কৰক]

মন্তব্য / Comment[সম্পাদনা কৰক]

TextCleanerBot-ক বটৰ অধিকাৰ প্ৰদান কৰা হ'ল। মেটাৱিকিৰ আলোচনা চাওক। --SlowPhoton (talk) 03:04, 23 May 2020 (ইউ.টি.চি.)

প্ৰতিস্থাপন সঁজুলি[সম্পাদনা কৰক]

আগতে সৃষ্টি কৰা প্ৰবন্ধৰ কিছুমান ত্ৰুটি শুধৰাবলে ইতিমধ্যে ব'ট আৰম্ভ কৰা হৈছে, লগতে নতুনকৈ লিখি থকা প্ৰবন্ধবোৰত বাংলা বঙালী ৰ আৰু সৰু-সুৰা ত্ৰুটি শুধৰাবৰ বাবে সদস্য:দিব্য দত্তই সজা প্ৰতিস্থাপন সঁজুলিটো (ৱিকিউৎসত ব্যৱহৃত) অসমীয়া ৱিকিপিডিয়াতো সকলো সদস্যৰ বাবে সংযোগ কৰাৰ প্ৰস্তাৱ আগবঢ়ালোঁ। common.js ফাইল পৰিৱৰ্তনেৰে সকলোৰে ইণ্টাৰফেইচত ই প্ৰভাৱ পেলাব বাবে সঁজুলিটো সংযোগ কৰাৰ আগতে সদস্যসকলে সমৰ্থন জনোৱাটো প্ৰয়োজনীয়। — গীতাৰ্থ বৰদলৈ (talk) 18:12, 1 June 2020 (ইউ.টি.চি.)

সমৰ্থন কৰোঁ[সম্পাদনা কৰক]

  1. স্থাপন কৰিলে সকলোধৰণৰ কী-বৰ্ড ব্যৱহাৰকাৰীৰে সুবিধা হ'ব বুলি মত পোষণ কৰোঁ। --ন্দন চিৰিং ফুকন 18:17, 1 June 2020 (ইউ.টি.চি.)
  2. বিশ্বজিৎ বৈশ্য (talk) 18:24, 1 June 2020 (ইউ.টি.চি.)
  3. Lachit Spell Checker ৱিকিপিডিয়াত সংযোগৰ কাম চলি আছে৷ এইটো হ’লে প্ৰবন্ধ এটা প্ৰকাশ কৰাৰ আগেয়ে সদস্যসকলে বৰ্ণাশুদ্ধিবোৰ নিষ্কাষণ কৰি ল’ব পাৰিব৷ Mridul Kumar Sharmah (talk) 00:04, 2 June 2020 (ইউ.টি.চি.)
  4. হানিফ আলী (সদস্য) 00:50, 2 June 2020 (ইউ.টি.চি.)
  5. দিব্য দত্ত (বাৰ্তা) 03:21, 2 June 2020 (ইউ.টি.চি.)
  6. ঈশান জ্যোতি বৰা (talk) 07:02, 2 June 2020 (ইউ.টি.চি.)
  7. ইতিমধ্যে আৰম্ভ কৰা টে'কব'টটো ভাল পাইছোঁ। প্ৰতিষ্ঠাপন সঁজুলিটোও সকলোৰে সহায়ক হ'ব বুলি আশাবাদী। Priyankush Deka (talk) 11:15, 2 June 2020 (ইউ.টি.চি.)
  8. এনে সঁজুলি এটাৰ অভাৱ বাৰুকৈ অনুভৱ কৰিছিলোঁ। - অজয়  আহক আলোচনা কৰোঁ 13:47, 2 June 2020 (ইউ.টি.চি.)

সমৰ্থন নকৰোঁ[সম্পাদনা কৰক]

মন্তব্য[সম্পাদনা কৰক]

সদস্য:Mridul Kumar Sharmahৰ লাচিত বৰ্ণশুদ্ধি নিবাৰকটো ৱিকিত সংযোগ কৰিলে, বেছি ভাল হ'ব। তেঁও বহুদিনৰ পৰা সংগ্ৰহ কৰি ৰখা বহুতো সাধাৰণতে হোৱা ভুল সহজতে শুধৰাব পৰা যাব। তেঁও ডাটাখিনি দিছে ইতিমধ্যে। --দিব্য দত্ত (বাৰ্তা) 18:20, 1 June 2020 (ইউ.টি.চি.)
Lachit Spell Checker ৱিকিপিডিয়াত সংযোগৰ কাম চলি আছে। এইটো হ’লে প্ৰবন্ধ এটা প্ৰকাশ কৰাৰ আগেয়ে সদস্যসকলে বৰ্ণাশুদ্ধিবোৰ নিষ্কাষণ কৰি ল’ব পাৰিব। Mridul Kumar Sharmah (talk) 00:04, 2 June 2020 (ইউ.টি.চি.)
লাচিত বৰ্ণশুদ্ধি নিবাৰকটো সংযোগ কৰাৰ সমৰ্থন কৰোঁ। পিছে ইয়াক default ৰূপত সক্ৰিয় নকৰি পছন্দসমূহত গৈ গেজেট ৰূপে সক্ৰিয় কৰিব পাৰিলে ভাল হ'ব বুলি ভাবোঁ। default ৰূপত ৰাখিলে নতুন সম্পাদকে ইয়াৰ ব্যৱহাৰৰ ভালদৰে বুজি নাপাব পাৰে। গেজেট হিচাপে ইয়াক beta ৰূপত ৰাখিলে ভাল হ'ব। কিয়নো কিছু কিছুক্ষেত্ৰত বৰ্ণশুদ্ধি নিবাৰকটোৱে শুদ্ধকৈ কাম নকৰে, সেয়ে প্ৰথমে ইয়াক পৰীক্ষামূলক ভাবে beta ৰূপত চলালে ভাল হ'ব। সময়ৰ লগত ইয়াক উন্নত কৰিব পৰা যাব। --SlowPhoton (talk) 05:47, 2 June 2020 (ইউ.টি.চি.)
হয়, প্ৰথমে গেজেট ৰূপত ৰাখিব লাগে, পিছত উন্নত হ’লে সকলোৰে বাবে default কৰি দিব লাগে। দিব্য দত্ত (বাৰ্তা) 12:46, 2 June 2020 (ইউ.টি.চি.)

Editing news 2020 #2[সম্পাদনা কৰক]

20:32, 17 June 2020 (ইউ.টি.চি.)

Editing news 2020 #3[সম্পাদনা কৰক]

12:55, 9 July 2020 (ইউ.টি.চি.)

‎Improving the translation support for the Assamese Wikipedia[সম্পাদনা কৰক]

Hi!

Content translation has been successful in supporting the translation process on many Wikipedia communities, and we want to help additional wikis with potential to grow using translation as part of a new initiative.

Content translation facilitates the creation of Wikipedia articles by translating content from other languages. It has been used already to create more than half a million articles. In addition, the tool provides mechanisms to encourage the creation of good quality content, preventing the publication of lightly edited machine translations. In general, our analysis shows that the translations produced are less likely to be deleted than the articles started from scratch.

Assamese Wikipedia editors have used Content translation to create more than three hundred articles. Given the size of the editing community, we think that there is potential to use translation to create more articles, expand existing ones, and attract new editors that learn how to make productive edits. Translation can help the community to reduce the language gap with other languages and grow the number of editors in a sustainable way. In order to achieve this goal, we want to collaborate with you to make Content translation more visible in the Assamese Wikipedia and support new ways to translate.

As a first step, during the next weeks we plan to enable Content translation by default on the Assamese Wikipedia. That will make it easy for users to discover the tool through several entry points. However, users not interested in translation will still be able to disable it from their preferences.

Please feel free to share any comment in this conversation thread.

Thanks! --Amir E. Aharoni / আমীৰ এ. আহৰোনি (WMF) (talk) 07:26, 10 August 2020 (ইউ.টি.চি.)

Yes, that would be a welcome move. We tried to put the Content Translation under Tools but couldn't do so. Thank you. — গীতাৰ্থ বৰদলৈ (talk) 09:35, 26 August 2020 (ইউ.টি.চি.)